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ABSTRACT 

This brief paper is a response to a reanalysis (Krueger & 

Zhu, 2003) of a report (Mayer, Peterson, Myers, Tuttle, & Howell, 2002). The 
response is offered by two of the authors (Myers & Mayer) of the original 
report. The original report presented an evaluation of the impact of vouchers 
on students' reading and mathematics achievement and on parents' satisfaction 
with their schools. The evaluation looked at a program in New York City begun 
in 1997 in which 1,300 scholarships (vouchers) worth up to $1,400 each were 
granted, through a lottery, to public-school students in grades K-4 to attend 
private schools. This paper comments on two issues raised in the reanalysis: 
(1) the sensitivity of the findings to different definitions of African 
American; and (2) the use of the kindergarten cohort and other students with 
missing baseline test data when estimating the impact of a voucher offer on 
students' achievement. This paper includes three tables that show the 
reading, mathematics, and composition test scores from the original report; 
lengthy footnotes; and nine references. (WFA) 
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Mathematica Policy Research (MPR) conducts studies such as the School Choice 
Scholarship Foundation (SCSF) evaluation to help inform policy debates. As part of our effort, 
we believe that other researchers should have an opportunity to scrutinize our findings and to 
lend their own perspective to analyses of the primary data. To this end, MPR made available to 
researchers the SCSF data files. 

Recently, Krueger and Zhu (2003) have undertaken analyses of the SCSF data collected by 
MPR and have produced new and important findings that build on the results we presented in our 
final report (Mayer, Peterson, Myers, Tuttle, and Howell 2002).' We comment on two issues 
they raise in their paper: (1) the sensitivity of the findings to different definitions of African 
American and (2) the use of the kindergarten cohort and other students with missing baseline test 
data when estimating the impact of a voucher offer on students’ achievement.^ We turn to each 
of these topics below, but first we provide background on the evaluation. 



' Krueger and Zhu (2003) also comment on a number of findings in Howell and Peterson 
(2002), but our comments do not address these issues or concerns. 

2 

The comments presented here focus on the impacts of a voucher offer. Reports prepared 
by MPR (Peterson, Myers, and Howell 1998; Myers, Peterson, Mayer, Chou, and Howell 2000; 
Mayer, Peterson, Myers, Tuttle, and Howell 2002) present other estimates of the impacts of 
vouchers, including the impact of ever using a voucher and the impact of using a voucher for 
three full years. The impact of a voucher offer shows the potential impact of a voucher policy 
that has parameters similar to those imposed by the SCSF in its allocation of vouchers to children 
in New York City: (1) the amount of the voucher is about $1,400 per year, (2) vouchers are 
available for four years, (3) vouchers can be used to attend a secular or religious private school, 
and (4) eligible students come from low-income families. Furthermore, the impact of a voucher 
offer combines two processes: (1) the take-up rate by families offered the opportunity to use a 
voucher, and (2) the impact of the vouchers on those who use them. The impact of ever using a 
voucher applies only to those who used a voucher and does not address the impact of a broad 
policy that would make vouchers available to all low-income students. The impact of using a 
voucher for three years has the same limitation as the estimate of ever using a voucher and 
assumes that the full difference in the observed averages for the treatment group and the control 
group can be attributed to the experiences of those students who used a voucher for three full 
years and not the experiences of those who used a voucher for one or two years. About 53 
percent of the voucher group used a voucher for three full years, while 74 percent and 64 percent 
used a voucher in the first and second years, respectively. Krueger and Zhu provide a useful 
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BACKGROUND OF THE SCSF EVALUATION 



In 1 997, the SCSF announced that it would provide 1 ,300 scholarships (vouchers) to low- 
income New York City (NYC) public school students in grades K through 4 to attend private 
schools (secular or religious). Each scholarship or voucher was worth up to $1,400 annually and 
could be used for up to four years. The SCSF received applications from about 1 1,000 eligible 
students from February through April 1997, and a lottery was used to select students who would 
be offered scholarships. In practice, families were randomly selected, and all students within the 
family who were eligible for a voucher were offered it. Those families not selected comprised 
the control group. 

MPR and Paul Peterson from Harvard University collaborated on an evaluation of the 
impact of the vouchers on students’ reading and mathematics achievement and on parents’ 
satisfaction with their schools. We released four reports between 1997 and 2002. Our final 
report (Mayer, Peterson, Myers, Tuttle, and Howell 2002) concluded the following about the 
impacts on the test scores of students who received a voucher offer: ^ 



(continued) 

discussion of the role of measurement error when estimating the impact of ever attending a 
private school or the impact of attending a private school for three years. They show that 
inaccuracies in reporting private school attendance (use of a voucher) will upwardly bias the 
estimated impact of ever attending a private school or attending a private school for three years; 
that is, these estimates may provide an overly optimistic view of the impacts of ever attending a 
private school or attended three full years. With the cautions expressed by Mayer et al. and 
Kruger and Zhu about the other assumption made when interpreting the impact of attending a 
private school three years — impacts assumed to be concentrated among only those who use 
vouchers for three years, and the biases that arise with measurement error in the indicator for 
private school attendance, one must interpret impacts of attending a private school for three years 
very cautiously. 

^ During the course of the reanalysis of the test score data, Krueger pointed out a small 
amount of variation in the sample weights within stratum (personal communication 2002). The 
two sources of variation in weights included: (1) adjustments for families who applied for 
vouchers multiple times and who were not detected before randomization and (2) a coding error 
that occurred when we created variables that indicated the stratification plan used for the random 
assignment of students to the voucher and control groups. The coding error did not affect the 
actual random assignment; however, it did come into play when we initially used a post- 
stratification procedure to adjust the baseline weights and again when estimating analytic 
models. The original stratification plan for random assignment relied on the date of application 
for a voucher, whether an applicant was attending a public school with lower than average 
achievement, and the number of students in a family who were eligible for a scholarship. After 
random assignment, we learned that some families had incorrectly reported the number of 
children eligible for a scholarship and this information was entered into the data files. This new 
information was used instead of the original information on the number of students eligible for a 
voucher when conducting the initial post-stratification adjustment and when constructing the 
indicators for the random assignment plan (strata) that were later used in the analytic models. 
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• On standardized tests, students who were in grades 1 through 4 when they applied for 
a voucher (grades 4 through 7 at the conclusion of the study) and were offered a 
voucher generally performed at about the same level as students who were not offered 
a voucher (the control group). That is, after three years, there was no impact of a 
voucher offer on students’ reading and mathematics achievement. 

• The pattern of impacts for Latino students differed from the pattern of impacts for 
Afiican American students.”* We found no impact of a scholarship offer on the test 
scores of Latino students, but we found a statistically significant impact on the test 
scores of Afiican American students who were in grade 1 through 4 at the outset of 
the study. After three years, the composite test scores (an average of reading and 
mathematics test scores) of Afiican American students offered a voucher were about 
5.5 percentile points higher (our revised estimate of the analyses presented in that 
report is 5.0 percentile points, see footnote 3) than the composite test scores of 
Afiican American students in the control group. 

• Separate analyses conducted by grade cohort showed that the impact of a voucher 
offer for Afiican American students was significant and positive for the eldest 
students. 



STUDENTS’ RACE AND ETHNICITY AND THE IMPACT OF USING DIFFERENT 
DEFINITIONS 

When the SCSF evaluation was launched, race and ethnicity were not thought to be critical 
elements of the study. However, after Peterson and his associates initially found different 
impacts for Afiican American and other students in their evaluation of the Washington, D.C., 
voucher experiment, race and ethnicity became a more central issue in the analyses of the SCSF 
data (Wolf et. al. 2000).^ Kmeger and Zhu point out that “race is a social construct that varies 



(continued) 

During the process of correcting the coding error, MPR took the opportunity to improve the 
nonresponse adjusted weights used in the previous analyses and constructed a full set of new 
nonresponse adjusted weights. A comparison of the results obtained with the new weights and 
those in the 2002 report show that the changes in weights make little substantive difference to the 
findings presented by Mayer et al. (2002). For example, the impact on the combined test scores 
for all students in grades 1 through 4 three years after offering the scholarships with the original 
weights was 0.93 and with the revised weights the impact is 0.95. As another example, the 
impact for all African American students in grades 1 through 4 with the original weights was 
5.50 percentile points and with the revised weights is 5.03. Tables 1-3 present the revised 
estimates of impacts of a voucher offer using the new weights and the corrected indicator 
variables for the randomization plan (stratification variables). 

”* Race and ethnicity were based on the race and ethnicity of the child’s mother or female 
guardian. 

^ The final report on the findings from Washington, DC suggests that the impact for African 
American students disappeared after three years (see, Howell and Peterson 2002). 
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from survey to survey” and that researchers should strive to use a “consistent and clear definition 
of race” so that results can be compared across studies. Unfortunately, given the lack of 
consistency in the definition of race, researchers often choose measures that are comparable to 
one survey or study, but not to others. Our research team faced such a choice at the outset of the 
study and chose to use the parent race question that was similar to that used in the Milwaukee 
Parental Choice Program Study, which was conducted in the early 1990s. 

Had we used the Office of Management and Budget guidelines, as Krueger and Zhu point 
out, we would have used the approach of asking separate questions for race and ethnicity. This 
in turn would have allowed us to parse the data into three frequently used definitions of African 
Americans; black non-Hispanic, black Hispanic, black either (non-Hispanic or Hispanic). The 
survey item used in Mayer et al., allowed us to identify only black non-Hispanics and not black 
Hispanics. 

Krueger and Zhu (2003) used an alternative definition of African American in many of then- 
analyses. Their definition included students whose mother or father was black (non-Hispanic or 
Hispanic), while Mayer et. al. defined students as African American if their mothers were black 
non-Hispanic. Using Krueger and Zhu’s alternative definition shows that students who were 
offered a voucher and in grades 1 through 4 at the time of application (grades 4 through 7 at the 
conclusion of the study), on average, experienced a positive statistically significant impact on 
their composite test scores (4.00 percentile points and t = 2.50; or when covariates are added to 
the model 2.67 and t = 1.78 see Table 5 in Krueger and Zhu). This positive finding is lower than 
the impact reported in Table 1 (4.00 or 2.67 versus 5.03) and suggests that the impacts 
previously reported are sensitive to the definition of race and ethnicity adopted by the researchers 
and should be interpreted accordingly.^ 



FURTHER EXPANDING THE ANALYSIS TO INCLUDE MORE STUDENTS 

Krueger and Zhu show that including students who were kindergarteners when they applied 
for a voucher further reduces the impacts for African American students. Depending on the 
statistical model and definition of African American used, the composite test score impacts are 
small or the impacts cannot be distinguished from zero. 

Our research team considered whether to include kindergarteners when estimating test score 
impacts, but ultimately decided to exclude them. As the analysis plans developed in the early 
stages of the SCSF evaluation, we decided to compute impacts for all students and for students in 
each of the grade cohorts, because there was some speculation that we might find impacts among 
older students and not the yoimgest. When comparing baseline test scores for students in the 
treatment and control groups we observed some differences among cohorts, for example, which 



^ In Mayer et al. (2002), we used as a minimum threshold for statistical significance, a two- 
tailed statistical test with the probability of rejecting the hypothesis of no impact by chance alone 
set at 0.10. We also indicated in the report whether an impact was significant at the 0.05 level or 
the 0.01 level. We use the same conventions here. Krueger and Zhu suggest that because the t- 
ratio is 1.78 for the estimated impact of 2.67, that results for grades 1 through 4 are fragile if 
their definition of race is used. 
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suggested a need to statistically adjust for the pretest scores when computing impacts^ 
Furthermore, given that the sample sizes of the individual cohorts were somewhat small, we 
believed that using the pretest score would improve the precision with which we could estimate 
impacts within cohorts. Given that we had not collected baseline test scores for kindergarteners 
because of logistical concerns, we opted to exclude them when computing impacts in test 
scores.* * 



SUMMARY 

What do we conclude from Krueger and Zhu’s reanalysis and the analyses presented by 
Mayer et al.? First, the findings suggest that, among those students who applied for a voucher 
while in grades 1 through 4 and for whom we had baseline test scores, the offer of a voucher had 
a small positive impact on the achievement of African American students no matter which of the 
black definitions discussed above are used; however, the impacts are concentrated among the 
oldest students (the grade 4 cohort). This suggests that the eldest grade could be an outlier. 
Second, when the sample is broadened to include the kindergarteners or others without baseline 
test scores as Kruger and Zhu did, the average impacts for students in K through 4 become 
smaller and in some statistical models cannot be distinguished from zero. Third, when the 
definition of race is broadened, and kindergarteners and other students without baseline scores 
are included, the impact of a voucher offer is not statistically significant. 

At the time of the second follow-up report, MPR cautioned readers about placing too much 
emphasis on the average impact for African American students in grades 1 through 4 because 
“much of the overall impact of a voucher on African American students’ achievement is 
concentrated among students who were in 6th grade” (the grade 4 cohort which was in grade 7 
by the conclusion of the study). Although in the final report, which was based on an additional 
round of test score and survey data, the impacts across grade cohorts 1 through 4 appeared more 
in line with one another, the new evidence presented by Krueger and Zhu suggests that one must 
remain cautious when interpreting the findings for African Americans. 



’ Krueger and Zhu suggest that reliance on this approach for adjusting for baseline 
differences between students in the treatment group and the control group may be misplaced, 
because if there are chance differences between students in the treatment group and the control 
group, then there could also be a spurious correlation between the pretest score and the follow-up 
test score that may bias the estimated treatment effect. 

* Our research team debated many times whether we should continue to give achievement 
tests to students in the kindergarten cohort after we made the initial decision to exclude the 
kindergarteners from the test score analyses. Each time we concluded that given that we were 
collecting test data from all cohorts simultaneously in group settings the additional costs to the 
evaluation for collecting the test data were not substantial and believed that the full range of test 
data for all students could be useful for future analyses. Perhaps most importantly, we were 
concerned about the differential treatment of families with children in different cohorts, and their 
perceived inequities and its effect on the response rates. We also note that regardless of our 
decisions concerning the exclusion of students from the evaluation sample, SCSF always 
planned to continue to fund their vouchers. SCSF was interested in both providing voucher to 
students from low-income families so that they could attend private schools and in learning about 
the impacts of vouchers on students’ achievement and related outcomes. 
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TABLE 1 



OFFER OF A VOUCHER YEAR THREE COMPOSITE TEST SCORE IMPACTS 

(Percentile) 





Scholarship 

Offered 


Control 

Group 


Scholarship 
Offer Impact 


t-stat 




(N) 


All students: 


Overall - Average 


27.84 


26.90 


0.95 


0.81 




1250 


Grades 4+5 


26.56 


26.55 


0.01 


0.01 




680 


Grades 6+7 


29.47 


27.41 


2.06 


1.43 




570 


Grade 4 


27.51 


26.90 


0.61 


0.22 




331 


Grade 5 


25.65 


26.29 


-0.64 


-0.36 




349 


Grade 6 


29.52 


27.83 


1.69 


0.90 




322 


Grade 7 


29.41 


26.96 


2.45 


1.15 




248 


African-American students: 


Overall - Average 


26.63 


21.59 


5.03 


3.09 




519 


Grades 4+5 


25.26 


20.71 


4.55 


1.98 




283 


Grades 6+7 


28.27 


23.31 


4.96 


2.48 




236 


Grade 4 


27.66 


23.38 


4.28 


0.88 




127 


Grade 5 


22.95 


20.85 


2.10 


0.96 




156 


Grade 6 


28.19 


23.71 


4.47 


1.56 




130 


Grade 7 


28.36 


21.40 


6.96 


2.07 


nnn 


106 


Hispanic students: 


Overall - Average 


27.65 


28.65 


-1.00 


-0.63 




637 


Grades 4+5 


25.71 


27.44 


-1.73 


-0.77 




347 


Grades 6+7 


30.10 


29.93 


0.17 


0.08 




290 


Grade 4 


26.93 


26.55 


0.38 


0.11 




178 


Grade 5 


24.47 


27.94 


-3.47 


-1.09 




169 


Grade 6 


30.07 


28.73 


1.34 


0.49 




170 


Grade 7 


30.14 


29.30 


0.84 


0.24 




120 



♦Impact is statistically significant at . 10 level, two tailed test 
** Impact is statistically significant at .05 level, two-tailed test 
♦♦♦Impact is statistically significant at .01 level, two-tailed test 
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TABLE 2 



OFFER OF A VOUCHER YEAR THREE MATH TEST SCORE IMPACTS 

(Percentile) 





Scholarship 

Offered 


Control 

Group 


Scholarship 
Offer Impact 


t-stat 


(N) 


All students: 
Overall - Average 


28.04 


26.55 


1.49 


1.13 


1250 


Grades 4+5 


26.14 


25.85 


0.30 


0.15 


680 


Grades 6+7 


30.46 


27.68 


2.78 


1.70 


570 


Grade 4 


26.83 


25.30 


1.53 


0.48 


331 


Grade 5 


25.48 


26.00 


-0.52 


-0.24 


349 


Grade 6 


29.86 


26.63 


3.23 


1.43 


322 


Grade 7 


31.23 


28.92 


2.31 


0.92 


248 


African-American students: 
Overall - Average 


26.45 


20.03 


6.42 


3.49 *** 


519 


Grades 4+5 


24.17 


18.25 


5.92 


2.38 ** 


283 


Grades 6+7 


29.17 


23.56 


5.62 


2.37 ** 


236 


Grade 4 


26.57 


21.47 


5.10 


0.92 


127 


Grade 5 


21.85 


17.47 


4.39 


1.73 


156 


Grade 6 


27.44 


23.49 


3.95 


1.08 


130 


Grade 7 


31.22 


22.74 


8.47 


2.34 ** 


106 


Hispanic students: 
Overall - Average 


28.13 


28.43 


-0.29 


-0.16 


637 


Grades 4+5 


25.98 


27.42 


-1.44 


-0.55 


347 


Grades 6+7 


30.84 


30.20 


0.64 


0.28 


290 


Grade 4 


26.67 


24.69 


1.98 


0.52 


178 


Grade 5 


25.28 


29.37 


-4.09 


-1.06 


169 


Grade 6 


31.15 


26.56 


4.60 


1.47 


170 


Grade 7 


30.38 


33.82 


-3.44 


-0.84 


120 



*!mpact is statistically significant at .10 level, two tailed test 
** Impact is statistically significant at .05 level, two-tailed test 
***lmpact is statistically significant at .01 level, two-tailed test 
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TABLE 3 



OFFER OF A VOUCHER YEAR THREE READING TEST SCORE IMPACTS 

(Percentile) 





Scholarship 

Offered 


Control 

Group 


Scholarship 
Offer Impact 


t-stat 




(N) 


All students: 
Overall - Average 


27.64 


27.24 


0.40 


0.30 




1250 


Grades 4+5 


26.98 


27.26 


-0.27 


-0.14 




680 


Grades 6+7 


28.48 


27.14 


1.34 


0.77 




570 


Grade 4 


28.19 


28.50 


-0.31 


-0.11 




331 


Grade 5 


25.82 


26.58 


-0.76 


-0.35 




349 


Grade 6 


29.18 


29.03 


0.15 


0.07 




322 


Grade 7 


27.58 


25.00 


2.58 


1.02 




248 


African-American students: 
Overall - Average 


26.81 


23.16 


3.65 


1.89 


* 


519 


Grades 4+5 


26.35 


23.17 


3.18 


1.13 




283 


Grades 6+7 


27.36 


23.06 


4.30 


1.89 


* 


236 


Grade 4 


28.74 


25.29 


3.45 


0.64 




127 


Grade 5 


24.05 


24.23 


-0.19 


-0.06 




156 


Grade 6 


28.93 


23.94 


5.00 


1.56 




130 


Grade 7 


25.50 


20.07 


5.44 


1.48 




106 


Hispanic students: 
Overall - Average 


27.17 


28.88 


-1.71 


-0.97 




637 


Grades 4+5 


25.44 


27.45 


-2.01 


-0.80 




347 


Grades 6+7 


29.35 


29.66 


-0.31 


-0.13 




290 


Grade 4 


27.19 


28.41 


-1.22 


-0.32 




178 


Grade 5 


23.67 


26.52 


-2.86 


-0.85 




169 


Grade 6 


28.99 


30.91 


-1.92 


-0.58 




170 


Grade 7 


29.89 


24.78 


5.12 


1.30 




120 



♦Impact is statistically significant at . 10 level, two tailed test 
♦♦Impact is statistically significant at .05 level, two-tailed test 
♦♦♦Impact is statistically significant at .01 level, two-tailed test 
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TABLE 1 



OFFER OF A VOUCHER YEAR THREE COMPOSITE TEST SCORE IMPACTS 

(Percentile) 





Scholarship 

Offered 


Control 

Group 


Scholarship 
Offer Impact 


t-stat 




(N) 


All students: 


Overall - Average 


27.84 


26.90 


0.95 


0.81 




1250 


Grades 4+5 


26.56 


26.55 


0.01 


0.01 




680 


Grades 6+7 


29.47 


27.41 


2.06 


1.43 




570 


Grade 4 


27.51 


26.90 


0.61 


0.22 




331 


Grade 5 


25.65 


26.29 


-0.64 


-0.36 




349 


Grade 6 


29.52 


27.83 


1.69 


0.90 




322 


Grade 7 


29.41 


26.96 


2.45 


1.15 




248 


African-American students: 


Overall - Average 


26.63 


21.59 


5.03 


3.09 




519 


Grades 4+5 


25.26 


20.71 


4.55 


1.98 




283 


Grades 6+7 


28.27 


23.31 


4.96 


2.48 




236 


Grade 4 


27.66 


23.38 


4.28 


0.88 




127 


Grade 5 


22.95 


20.85 


2.10 


0.96 




156 


Grade 6 


28.19 


23.71 


4.47 


1.56 




130 


Grade 7 


28.36 


21.40 


6.96 


2.07 




106 


Hispanic students: 


Overall - Average 


27.65 


28.65 


-1.00 


-0.63 




637 


Grades 4+5 


25.71 


27.44 


-1.73 


-0.77 




347 


Grades 6+7 


30.10 


29.93 


0.17 


0.08 




290 


Grade 4 


26.93 


26.55 


0.38 


0.11 




178 


Grade 5 


24.47 


27.94 


-3.47 


-1.09 




169 


Grade 6 


30.07 


28.73 


1.34 


0.49 




170 


Grade 7 


30.14 


29.30 


0.84 


0.24 




120 



♦Impact is statistically significant at .10 level, two tailed test 
** Impact is statistically significant at .05 level, two-tailed test 
♦♦♦Impact is statistically significant at .01 level, two-tailed test 
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